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Abstract 

We consider a decision network on an undirected graph in which each node corresponds to 
a decision variable, and each node and edge of the graph is associated with a reward func- 
tion whose value depends only on the variables of the corresponding nodes. The goal is to 
construct a decision vector which maximizes the total reward. This decision problem encom- 
passes a variety of models, including maximum-likelihood inference in graphical models (Markov 
Random Fields), combinatorial optimization on graphs, economic team theory and statistical 
physics. The network is endowed with a probabilistic structure in which costs are sampled from 
a distribution. Our aim is to identify sufRcient conditions on the network structure and cost 
distributions to guarantee average-case polynomiality of the underlying optimization problem. 
Additionally, we wish to characterize the efhciency of a decentralized solution generated on the 
basis of local information. 

We construct a new decentralized algorithm called Cavity Expansion and establish its theo- 
retical performance for a variety of graph models and reward function distributions. Specifically, 
for certain classes of models we prove that our algorithm is able to find near optimal solutions 
with high probability in a decentralized way. The success of the algorithm is based on the net- 
work exhibiting a certain correlation decay (long-range independence) property and we prove 
that this property is indeed exhibited by the models of interest. Our results have the following 
surprising implications in the area of average case complexity of algorithms. Finding the largest 
independent (stable) set of a graph is a well known NP-hard optimization problem for which 
no polynomial time approximation scheme is possible even for graphs with largest connectivity 
equal to three, unless P=NP. Yet we show that the closely related maximum weighted indepen- 
dent set problem for the same class of graphs admits a PTAS when the weights are independent 
identically distributed with the exponential distribution. Namely, randomization of the reward 
function turns an NP-hard problem into a tractable one. 



1 Introduction and literature review 

We consider a team of agents working in a networked structure (V, E), where ^ is a set of agents, 
and E the set of edges of the network, each edge indicating potential local interactions between 
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agents. Each agent v has to make a decision from a finite set, and the team incurs a total reward 
= Yli ^v{xv) + Yliu veE ^u,v{xu, Xy). The goal of each agent is to choose its decision Xy so that 
the total reward F is maximized. This model subsumes many models in a variety of fields including 
economic team theory, statistical inference, combinatorial optimization on graphs and statistical 
physics. 

As an example, common models in the area of statistical inference are graphical models, 
Bayesian networks, and Markov Random Fields (MRF) (see |WJ08] for an overview of inference 
techniques for graphical models, and |MM081 IHWOSj for a comprehensive study of the relations 
between statistical physics, statistical inference, and combinatorial optimization). One of the key 
objects in such a model is the state which achieves the mode of the density, namely, the state which 
maximizes the a priori likelihood. The problem of finding such a state can be cast as a problem 
defined above. 

In the economic team theory (see |Mar55t IRad62[ IMR72j ) , an interesting question was raised 
in [RROlj : what is the cost of decentralization in a chain of agents. In other words, if we assume 
that each node only receives local information on the network topology and costs, what kind of 
performance can the team attain? Cast in our framework, this the problem of finding the maximum 
of F{x) by means of local (decentralized) algorithms. 

Combinatorial optimization problems typically involve the task of finding a solution which 
minimizes or maximizes some objective function subject to various constraints supported by the 
underlying graph. Examples include the problem of finding a largest independent set, minimum 
and maximum cut problems, max-KSAT problems, etc. Finding an optimal solution in many such 
problems is a special case of the problem of finding max^; F(x) described above. 

Finally, a key object in statistical physics models is the so-called ground state - a state which 
achieves the minimum possible energy. Again, finding such an object reduces to solving the problem 
described above, namely solving the problem maxx F{x) (min^; —F[x) to be more precise). 

The combinatorial optimization nature of the decision problem max^; F{x) implies that the 
problem of finding x* = argmax^F(3;) is generally NP-hard, even for the special case when the 
decision space for each agent consists only of two elements. This motivates a search for approximate 
methods which find solutions that theoretically or empirically achieve good proximity to optimality. 
Such methods usually differ from field to field. In combinatorial optimization the focus has been on 
developing methods which achieve some provably guaranteed approximation level using a variety 
of approaches, including linear programming, semi-definite relaxations and purely combinatorial 
methods |Hoc97] . In the area of graphical models, researchers have been developing new families 
of distributed inference algorithms. One of the most studied techniques is the Belief Propagation 
(BP) algorithm |Lau96l Dor04t lYFWOOj . Since the algorithm proposed in the present paper bears 
some similarity and is motivated by BP algorithm, we provide below a brief summary of known 
theoretical facts about BP. 

The BP algorithm is known to find an optimal solution x* when the underlying graph is a tree, 
but may fail to converge, let alone produce an optimal (or correct) solution when the underlying 
graph contains cycles. Despite this fact, it often has excellent empirical performance. Also, in some 
cases, BP can be proven to produce an optimal solution, even when the underlying graph contains 
cycles. In a framework similar to ours, Moallemi and Van Roy |MR09| show that BP converges 
and produces an optimal solution when the action space is continuous and the cost functions 
^u,v and are quadratic and convex. Some generalization to generally convex functions are 
obtained in |MR07| . Other cases where BP produces optimal solutions include Maximum Weighted 
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Bipartite Matching |San071 IBBCZ081 IBSSOSj (for matchings), Maximum Weighted Independent 
Sets problems where the LP relaxation is tight ( |SSW07j ). network flow problems |GSW09] . and 
more generally, optimization problems defined on totally unimodular constraint matrices |Che08j . 

In this paper, we propose a new message-passing like algorithm for the problem of finding 
X* = argmax F{x), which we call the Cavity Expansion (CE) algorithm, and obtain sufficient 
conditions for the asymptotic optimality of our algorithm based on the so-called correlation decay 
property. Our algorithm draws upon several recent ideas. On the one hand, we rely on a technique 
used recently for constructing approximate counting algorithms. Specifically, Bandyopadhyay and 
Gamarnik |BG08j . and Weitz |Wei06j proposed approximate counting algorithms which are based 
on local (in the graph-theoretic sense) computation. Provided that the model exhibits a form of 
correlation decay these algorithms are approximate counting algorithms. The approach was later 
extended in Gamarnik and Katz |GK07bj . [GK07a] . Bayati et al |BGK+07| . Jung and Shah [JSOT] . 
The present work develops a similar approach but for the optimization problems. The description 
of the CE algorithm begins by introducing a notion of a cavity B^(x) for each node/decision pair 
{v, x) (the notion of cavity was heavily used recently in the statistical physics literature |MP03t 
IRBMM04j ^. It is also called bonus in the relevant papers |Ald01| . [XS03] . |GNS06| . B^{x) is defined 
as the diff'erence between the optimal reward for the entire network when the action in is x versus 
the optimal reward when the action in the same node is (any other base action can be taken 
instead of 0). It is easily shown that knowing B^{x) is equivalent to solving the original decision 
problem. We obtain a recursion expressing the cavity By{x) in terms of cavities of the neighbors 
of V in suitably modified sub-networks of the underlying network. The algorithm then proceeds 
by expanding this recursion in the breadth-first search manner for some designed number of steps 
t, thus constructing an associated computation tree with depth t. At the initialization point the 
cavity values are assigned some default value. Then the approximation value By{x) is computed 
using this computation tree. If this computation was conducted for t equalling roughly the length 
L of the longest self-avoiding path of the graph, it would result in exact computation of the cavity 
values By{x). Yet the computation effort associated with this scheme is exponential in L, which 
itself often grows linearly with the size of the graph. 

The key insight of our work is that in many cases, the dependence of the cavity By[x) on cavities 
associated with other nodes in the computation tree dies out exponentially fast as a function of 
the distance between the nodes. This phenomenon is generally called correlation decay. In earlier 
work |Ald92l lAldOTl IAS03[ IGNS061 IGG09| . it is shown that some optimization problems on locally 
tree-like graphs with random costs are tractable as they exhibit the correlation decay property. 
This is precisely our approach: we show that if we compute By{x) based on the computation tree 
with only constant depth t, the resulting error By{x) — By{x) is exponentially small in r. By taking 
r = 0(log(l/e)) for any target accuracy e, this approach leads to an e-approximation scheme for 
computing the optimal reward max^, F{x). Thus, the main associated technical goal is establishing 
the correlation decay property for the associated computation tree. 

We indeed establish that the correlation decay property holds for several classes of decision 
networks associated with random reward functions ^ = (<I>^, <I>^^„). Specifically, we give concrete 
results for the cases of uniform and Gaussian distributed functions for unconstrained optimization in 
networks with bounded connectivity (graph degree) A. We also consider exponentially distributed 
(with parameter 1) weights for the Maximum Weighted Independent Set problem. In this setting, 
the combination of CE (a message passing style algorithm) and a randomized setting has a partic- 
ularly interesting implication for the theory of average case analysis of combinatorial optimization. 
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Unlike some other NP-hard problems, finding the MWIS of a graph does not admit a constant fac- 
tor approximation algorithm for general graphs: Hastad |Has96] showed that for every < (5 < 1 no 
n^~^ approximation algorithm can exist for this problem unless P = NP, where n is the number of 
nodes. Even for the class of graphs with degree at most 3, no factor 1.0071 approximation algorithm 
can exist, under the same complexity-theoretic assumption, see Berman and Karpinski |BK98j . In 
contrast, we show when A < 3 and the node weights are independently generated with a parameter 

1 exponential distribution, the problem of finding the maximum weighted independent set admits 
a PTAS. Thus, surprisingly, introducing random weights translates a combinatorially intractable 
problem into a tractable one. We further extend these results to the case A > 3, but for different 
node weight distributions. 

The rest of the paper is organized as follows. In section [21 we describe the general model and 
notations. In section O we present our main results. In section HI we derive the cavity recursion, 
an exact recursion for computing the cavity of a node in a decision network, and from it develop 
the Cavity Expansion algorithm. In section O we prove that the correlation decay property implies 
optimality of the cavity recursion and local optimality of the solution. The rest of the paper is 
devoted to identifying sufficient conditions for correlation decay (and hence, optimality of the CE 
algorithm): in section [6l we show how a coupling argument can be used to prove the correlation 
decay property for the case of uniform and Gaussian weight distributions, and in section [71 we 
establish the correlation decay property for the MWIS problem using a different argument based 
on monotonocity. Concluding thoughts are in section [8] 

2 Model description and notations 

Consider a decision network Q = (V,E,^,x)- Here {V,E) is an undirected simple graph in which 
each node u € V represents an agent, and edges e £ E represent a possible interaction between 
two agents. Each agent makes a decision x„ G x — {0, 1, . . . , T — 1}. For every v £ V, a function 
*^*D • X ~^ is given. Also for every edge e = (n, v) a function $e : ~^ 1^ U {~c>o} is given. The 
inclusion of — oo into the range of $e is needed in order to model the "hard constraints" in the MWIS 
problem - prohibiting two ends of an edge to belong to an independent set. Functions <^t, and will 
be called potential functions and interaction functions respectively. Let <^ = ((<^„)„gv') (*^'e)eG-E)- 
A vector x = {xi,X2, ■ ■ ■ ,x^y^) of actions is called a solution for the decision network. The value 

of solution X is defined to be Fg(x) = Y.{u,v)(^E^u,v{xu,Xv) + Y.v^vixv)- The quantity Jg = 
maxxi*g(x) is called the (optimal) value of the network Q. A decision x is optimal if Fg(x.) = Jg. 

In a Markov Random Field (MRF), a set of random variables X = {Xi, . . . ,X„) is assigned 
a probability P(X = x) proportional to exp{Fg{x)). In this context, the quantity Fg{x.) can 
be considered as the log-likelihood of assignment x, and maximizing it corresponds to finding a 
maximum a posterior assignment of the MRF defined by Eg. 

The main focus of this paper will be on the case where ^v{x),^eix,y) are random variables 
(however, the actual realizations of the random variables are observed by the agents, and their 
decisions depend on the values ^vix) and ^e{x,y))- While we will usually assume independence of 
these random variables when v and e vary, we will allow dependence for the same v and e when we 
vary the decisions x, y. The details will be discussed when we proceed to concrete examples. 
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2.1 Examples 

2.1.1 Independent set 

Suppose the nodes of the graph are equipped with weights > 0,v £ V. A set of nodes 
/ C y is an independent set if (u, v) ^ E for every u,v £ I. The weight of an (independent) 
set / is ^„g/ Wu- The maximum weight independent set problem is the problem of finding the 
independent set / with the largest weight. It can be recast as a decision network problem by setting 

X = {0, 1}, $e(0, 0) = $e(0, 1) = '^eil, 0) = 0, $e(l, 1) = "OO, ^>„(1) = Wy, $,(0) = 0. 

2.1.2 Graph Coloring 

An assignment (p of nodes V to colors {1, . . . , g^} is defined to be proper coloring if no monochromatic 
edges are created. Namely, for every edge {v,u), 4>{v) ^ 4>{u). Suppose each node/color pair 
(f , x) e y X {1, . . . , q} is equipped with a weight W^^x > 0. The (weighted) coloring problem is the 
problem of finding a proper coloring (j) with maximum total weight W^^^i^^^ . In terms of decision 
network framework, we have $^^u(x, x) = — oo, <I>^^u(rE, y) = 0, Vx / y G x = {li • • • j 9}) {v-, u) £ E 
and ^y{x) = W^^^yv eV,x£x- 

2.1.3 MAX 2-SAT 

Let (Zi, . . . , Zn) be a set of boolean variables. Let (Ci, . . . , Cm) be a list of clauses of the form 
{Zi\/Zj), (ZiVZj), {ZiV Zj) or {ZiV Zj). The MAX-2SAT problem consists in finding an assignment 
for binary variables Zj which maximizes the number of satisfied clauses Cj. In terms of a decision 
network, take ^ = {1, . . . ,n}, E = : Zi and Zj appear in a common clause}, and for any 

k, let ^k{x,y) to be 1 if the clause Ck is satisfied when {Zi,Zj) = {x,y) and otherwise. Let 
^v{x) = for all v,x. 

2.1.4 MAP estimation 

In this example, we see a situation in which the reward functions are naturally randomized. Consider 
a graph {V,E) with \V\ = n and \E\ = m, a set of real numbers p = (pi, . . . ,pn) S [0, l]", and a 
family (/i, . . . , fm) of functions such that for each (i, j) G E, fij = fij{o, x,y) : R x {0, 1}^ — > IR+ 
where o e M and x,y G {0,1}. Assume that for each (x,y), fij{-,x,y) is a probability density 
function. Consider two sets C = (Ci)i<j<„ and O = {Oj)i<j<m of random variables, with joint 
probability density 

p(o,c)=npi''(i-K)'"'' n kjK3,ci,cj) 

C is a set of Bernoulli random variables ("causes") with probability P{Ci = 1) = Pi, and O is 
a set of continuous "observation" random variables. Conditional on the cause variables C, the 
observation variables O are independent, and each Oij has density fij{o,Ci,Cj). Assume the 
variables O represent observed measurements used to infer on hidden causes C. Using Bayes's 
formula, given observations O, the log posterior probability of the causes variables C is equal to: 

logP(C = c|0 = o) = i^ + ^$,(Q)+ ^i,i(Ci,Cj) 
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where 



$i(Ci) =log(pj/(l -pi))Ci 
(Cj ,Cj) = log(/i J {OiJ ,Ci,Cj)) 

where K is a random number which does not depend on c. Finding the maximum a posteriori 
values of C given O is equivalent to finding the optimal solution of the decision network Q = 
(V, E, {0, 1}). Note that the interaction functions ^ij are naturally randomized, since y) 
is a continuous random variable with distribution 

dF{^i,jix, y) = t)=e' Yl dF{fij{o, x',y') = e*) 

x',y'£{0,l} 

2.2 Notations 

For any two nodes u,v in V, let d{u, v) be the length (number of edges) of the shortest path 

between u and v. Given a node u and integer r > 0, let Bg{u,r) = {v £ V : d{u,v) < r} 

and Mg{u) = B{u,l)\{u} be the set of neighbors of u. For any node u, let Ag(n) = |A/'g(n)| 
be the number of neighbors of u in Q. Let Ag be the maximum degree of graph {V^E); namely, 
Ag = maxt, |AA(f)|. Often we will omit the reference to the network Q when it is obvious from the 
context. 

For any subgraph {V\E') of {V,E) (i.e. V CV, E' C EnV^), the subnetwork G' induced by 
{V,E) is the network {V',E\^',x), where = (($„)„6y/, ($e)eG£')- 

Given a subset of nodes v = {vi,...,Vk), and x = {xi, . . . ,Xk) € x'^i let Jg^v(x) be the 
optimal value when the actions of nodes vi, . . . ,Vk are fixed to be xi, . . . , respectively: Jg^v(x) = 

maxx:x„,=xi,i<i<fc -^e(x). Given v £ V and x S X) the quantity Bg^y{x) = Jg,v{x) — Jg,v{^) is called 
the cavity of action x at node v. Namely it is the difference of optimal values when the decision 
at node v is set to x and respectively (the choice of is arbitrary). The cavity function of v 
is Bg^y = {Bg^y{x))x&x- Since Bg^y{0) = 0, Bg^^ can be thought of as element of R-^"-*^. In the 
important special case x = {0) 1}) the cavity function is a scalar Bg^^ = Jg^„(l) — Jg^i,{0). In this 
case, if Bg^^ > (resp. Bg^^ < 0) then Jg^y{l) > Jg,v{0) and action 1 (resp. action 0) is optimal 
for V. When Bg^^ = there are optimal decisions consistent both with x^ = and x^ = 1. Again, 
when Q is obvious from the context, it will be omitted from the notation. 

For any network Q, we call M{Q) = max{\V\,\E\,\x\) the size of the network. Since we will 
exclusively consider graphs with degree bounded by a constant, for all practical purposes we can 
think of \V\ as the size of the instance. When we say polynomial time algorithm, we mean that the 
running time of the algorithm is upper bounded by a polynomial in \ V\. An algorithm A is said 
to be an e-loss additive approximation algorithm for the problem of finding the optimal decision if 
for any network Q it produces in polynomial time a decision x such that Jg — F{x) < e. If all cost 
functions are positive, the algorithm A is said to be an (1 + e)-factor multiplicative approximation 
algorithm if it outputs a solution x such that Jg/F{x) < 1 + e. We call such an algorithm an 
additive (resp. multiplicative) PTAS (Polynomial Time Approximation Scheme) if it is an e-loss 
(resp. (1 + e)-factor) additive (resp. multiplicative) approximation factor algorithm for every e > 
and runs in time which is polynomial in \V\. An algorithm is called an FPTAS (Fully Polynomial 
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Time Approximation Scheme) if it runs in time which is polynomial in n and 1/e. For our purposes 
another relevant class of algorithms is EPTAS. This is the class of algorithms which produces e 
approximation in time 0{\V\^^^^ g{e)), where ^(e) is some function independent from n. Namely, 
while it is not required that the running time of the algorithm is polynomial in 1/e, the 1/e quantity 
does not appear in the exponent of n. Finally, in our context, since the input is random, we will 
say that an algorithm is an additive (resp. multiplicative) PTAS with high probability if for all 
e > it outputs in time polynomial in \V\ a solution x such that P(Jg — F{x) > e) < e (resp. 
¥{F{x)/ Jg > 1 + e) < e); FPTAS and EPTAS w.h.p. are similarly defined. Since our algorithm 
provide probabilistic guarantee, one may wonder whether FPRAS (Fully Polynomial Randomized 
Approximtion Scheme) would be a more appropriate framework. The typical setting for FPRAS is, 
however, a deterministic problem input and the randomization is associated purely with algorithm. 
In our setting however, the setting itself is random, though the algorithms, with the exception of 
MWIS, are deterministic. 

3 Main results 

In this section we state our main results. The first two results relate to decision networks with 
uniformly and normally distributed costs, respectively, without any combinatorial constraints on 
the decisions. The last set of results corresponds to the MWIS problem, which does incorporate 
the combinatorial constraint of the independence property. 

3.1 Uniform and Gaussian Distributions 

Given Q = {V, E, <I>, {0, 1}), suppose that for all u £ V, ^u{^) is uniformly distributed on [—Ii, /i], 
<I>„(0) = 0, and that for every e e E, <&e(0, 0), <^e(l, 0), $e(0, 1) and $e(l,l) are aU independent 
and uniformly distributed on [— /2,l2], for some /i,/2 > 0. Intuitively, Ii quantifies the 'bias' each 
agent has towards one action or another, while I2 quantifies the strength of interactions between 
agents. 

Theorem 1. Let j3 = If l3{A — 1)^ < 1, then there exists an additive FPTAS for finding Jg 
with high probability. 

Now we turn to the case of Gaussian costs. Assume that for any edge e = (n, v) and any 
pair of action (x,y) € {0, 1}^, ^u,v{x,y) is a Gaussian random variable with mean and standard 
deviation cTg. For every node v £ V, suppose $t,(l) = and that <I>t,(0) is a Gaussian random 
variable with mean and standard deviation cjp. Assume that all rewards ^eix,y) and ^vix) are 
independent for all choices of v,e,x,y. 

Theorem 2. Let [5 = / 2"!! 2 • If /9(A — 1) + Y^^5(A"^^Tp' < 1, then there exists an additive 
FPTAS for finding Jg with high probability. 

While our main result was stated for the case of independent costs, we have obtained a more 
general result which incorporates the case of correlated edge costs. It is given as Proposition [6] in 
Section O 
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3.2 Maximum Weight Independent Sets 



Here, we consider a variation of the MWIS problem where the nodes of the graph are equipped 
with random weights Wi,i S V, drawn independently from a common distribution F(t) = ¥(W < 
t),t > 0. Let /* = I*{Q) be the largest weighted independent set, when it is unique and let W{I*) 
be its weight. In our setting it is a random variable. Observe that /* is indeed almost surely unique 
when F is a continuous distribution. 

Theorem 3. If Ag < 3 and the weights are exponentially distributed with parameter 1, then there 
exists a multiplicative EPTAS for finding Jg with high probability. The algorithm runs in time 



An interesting implication of Theorem [3] is that while the Maximum (cardinality) Independent 
Set problem admits neither a polynomial time algorithm nor a PTAS (unless P=NP), even when 
the degree is bounded to 3 |BK981 ITreOlj , the problem of finding the maximum weight independent 
set becomes tractable for certain distributions F, in the PTAS sense. 

The exponential distribution is not the only distribution which can be analyzed in this frame- 
work, it is just the easiest to work with. For any phase-type distribution, we can characterize 
correlation decay and identify sufficient conditions for correlation decay to hold. It is natural to 
ask if the above result can be generalized, and in particular to wonder if it is possible to find for 
each A a distribution which guarantees that the correlation decay property holds for graphs with 
degree bounded by A. It is indeed possible, as we extend Theorem [3l albeit to the case of mixtures 
of exponential distributions. Let p > 25 be an arbitrary constant and let aj = p' , j > 1. 

Theorem 4. Assume Ag < A, and that the weights are distributed according to P{W > t) = 
exp(— ajt). Then there exists a FPTAS for finding Jg with high probability. The algo- 
rithm runs in time 0(n(-)^]. 



Note that for the mixture of exponential distributions described above our algorithm is in fact 
an FPTAS as opposed to an EPTAS for Theorem [3l This is essentially due to the fact that the 
conditions of Theorem [3] are at the 'boundary' of correlation decay; more technical details are given 
in El 

Our final result is a partial converse to the results above; one could conjecture that randomizing 
the weights makes the problem essentially easy to solve, and that perhaps being able to solve the 
randomized version does not tell us much about the deterministic version. We show that this is 
not the case, and that the setting with random weights hits a complexity-theoretic barrier just 
as the classical cardinality problem does. Specifically, we show that for graphs with sufficiently 
large degree the problem of finding the largest weighted independent set with i.i.d. exponentially 
distributed weights does not admit a PTAS. We need to keep in mind that since we dealing with 
instances which are random (in terms of weights) and worst-case (in terms of the underlying graph) 
at the same time, we need to be careful as to the notion of hardness we use. 

Specifically, for any p < 1, define an algorithm ^ to be a factor-p polynomial time approximation 
algorithm for computing E[M^(/*)] for graphs with degree at most A, if given any graph with 
degree at most A, A produces a value w such that p < w/ E[W{P)] < 1/p in time bounded by 
0{n'^^^^). Here the expectation is with respect to the exponential weight distribution and the 
constant exponent 0(1) is allowed to depend on A. 
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En route of Theorems [3] and U] we establish similar results for expectations: there exists an 
EPTAS and FPTAS respectively for computing the deterministic quantity the expected 

weight of the MWIS in the graph Q considered. 

However, our next result shows that if the maximum degree of the graph is increased, it is 
impossible to approximate the quantity arbitrarily closely, unless P=NP. Specifically, 

Theorem 5. There exist Aq and c*, C2 such that for all A > Aq the problem of computing E[W{I*)] 
to within a multiplicative factor p = A/{c\ (log A) 2'^2Viog graphs with degree at most A cannot 

he solved in polynomial time, unless P=NP. 

We could compute a concrete Aq such that for all A > Aq the claim of the theorem holds, 
though such Aq explicitly does not seem to offer much insight. We note that in the related work 
by Trevisan [TreOlj . no attempt is made to compute a similar bound either. 

4 The cavity recursion 

In this section, we introduce the cavity recursion, an exact recursion for computing the cavity 
functions of each node in a general decision network. We first start by giving the cavity recursion 
for trees (which is already known as the max-product belief propagation algorithm), and then give 
a generalization for all networks. 

4.1 Trees 

Given a decision network Q = iy,E,(^, x) suppose that (V, E) is a rooted tree with a root u. Using 
the graph orientation induced by the choice of u as a root, let be the subtree rooted in node 
V for any node v £ V. In particular, Q = Qu- Denote by C{u) the set of children of u in {V,E). 
Given a node n G y, a child v G C{u), and an arbitrary vector B = {B{x), x G x), define 

liu^v{x,B) = m.&x.{^u,v{x,y) + B{y)) - max($„ ^(0, y) + B{y)) (1) 
y y 

for every action x £ x- A* is called partial cavity function. 
Proposition 1. For every u and x £ x^ 

Bu{x) = <^u{x) - <^uiO) + Hu^^ix,Bg^^^,) (2) 

v£C{u) 

Proof. Suppose C{u) = {vi, . . . , Vd}- Observe that the subtrees 0^^, I < i < d are disconnected (see 
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Figure 1: Cavity recursion for trees, equivalent to the BP algorithm 
figure HTT]) Thus, 

d 

Bu{x) = ^uix) + max I „. (x, x^) + Jg^.,vAxj) \ 



- $„(0) - max I ^$„,„,(0,xj) + Jg v,{xj))} 

= ^u{x) - ^„(0) 
d 



i=i 
For every j 

max (x, y) + Jg„^, (y)) - max (0, y) + Jg^^ .^^ {y)) = 

max {^u,v,{x,y) + Jg,^,v,iy) - Jg,^,j{0)) - max (0,y) + Jg,^,v,{y) - Jg.j,v,{0)) 

The quantity above is exactly fiu^^Ax, By g^ ,). □ 

Iteration ([2]) constitutes what is known as (max-product) belief propagation. Proposition [T] is 
the restatement of the well-known fact that BP finds an optimal solution on a tree (citation). BP 
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can be implemented in non-tree like graphs, but then it is not guaranteed to converge, and even 
when it does it may produce wrong (suboptimal) solutions. In the following section we construct 
a generalization of BP which is guaranteed to converge to an optimal decision. 



4.2 General graphs 

The goal of this subsection is to construct a generalization of identity ([2|) for an arbitrary network 
Q. This can be achieved by building a sequence of certain auxiliary decision networks G{u,j,x) 
constructed as follows. 

Given a decision network Q = (V, E,^,x) where the underlying graph is arbitrary, fix any node 
u and action x and let J\f{u) = {vi, . . . , Vd}- For every j = 1, . . . , d let Q{u,j, x) be the decision 
network (V' , E' ,x) o^i the same decision set x constructed as follows. {V',E') is the subgraph 
induced by V = V \ {u}. Namely, E' = E \ {{u, vi),... , {u, Vd)}. Also <&g = for all e in E' 
and the potential functions <I>^ are defined as follows. For any v £ V\{u, vi, . . . , vj-i, Wj+i, • • • , Vd}, 

= and 

K,iy) = '^v{y) + ^n,^,(x,y) for v£{vi,...,Vj_i} 

= + $«,^,(0,y) ior v £ {vj+i, . . . ,Vd} (3) 

Theorem 6 (Cavity Recursion). For every x € X; 

d 

Bu{x) = <^uix) - $„(0) (4) 

j=i 

Proof. For every /c = 0, 1, . . . , d, let Xj^k = x when j < k and = otherwise. Let v = {vi, . . . , Vd), 
and z = (zi, . . . , Zrf) G x'^- We have 



a 

Bu{x) = ^u{x) - $„(0) + max | ^ ^u,vj{x, Zj) + Jg\|„}_^(z)| 

d 



max • 

z 

j=l 



The first step of the proof consists in considering the following telescoping sum (see figure 14. 2p : 

d 



Sjx) = $„(x)-$„(0) + ^ 



k=l 



d 

max 

d 

max I ^ 



{ ^n,v,{xj^k, Zj) + Jg\{„},v(z)} (5) 
'Y<^u,VjiXj^k-l,Zj) + Jg\{.a},v(z)} 



and the /c*^ difference: 



a a 
- I ^ '^u,Vj{xj,k, Zj) + Jg\{„},v(z)} - maxj ^ ^u,Vj {Xj,k-1, Zj) + Jg\{„}^v(z)} (6) 



max ■ 

z 
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$„(x) - $„(0) + J 




\ ( -ps 



<^uix) - $„(0) + J 




- $„(0) + J 




V 



+ J 




( vi^S) \ 



+ J 



"5- 



J 



J 



V2( 



J 



J 



Vl 




V2f\ fyJa 



Vl 




( v^C) \ 



J 



V ^---^^^ " " -o ) 

v^-> " " <-^y 



Figure 2: First step: building the telescoping sum; black nodes indicate decision x, gray node 
decision 0; solid circles indicate neighbors of u, dotted circles indicate other nodes 
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Figure 3: Second step: build the modified subnetworks (here Q{u,2, x)); arrow represents modifi- 
cation of the potential function by incorporating interaction function into them 
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Let z_fc = {zi, Zk-i,Zk+i, Zd). Then, 
d 

max($„,„^^(x,2;fc) + max{ ^u,Vj{x,Zj)+ ^>„,t,j. (0, Zj) + Jg\{u}^v(z)} ) (7) 

]<k-l j>k+l 

Similarly, 

d 

i=i 

max ( ^u,Vk{0,Zk) +TtnSix{ (x, Zj) + $„,^,^.(0, Zj) + Jg\{„}^v(z)}) (8) 



2fc \ Z_/^. 

j<k-l j>k+l 



For each Zfe, we have (see figure lO]) : 

(Zk) 



max Y 

j<k-l j>k+l 



By adding and substrating Jg{^u,k,x),Vi.{^)^ expression Q can therefore be rewritten as 
max{^u,vt,{x,y) + Bg(^u^k,x){y)) - max($„,„JO, y) + Bg(^u^k,x){y)) 

which is exactly fiu^Vkix, Bg^u j^ ^))- Finally, we obtain: 

d 

Bu{x) = <S>u{x) - <^u{0) + J2 

l-^u^Vk {x-i Bg(y^^k,x),V}^ ) 

k=l 



□ 



4.3 Computation tree and the Cavity Expansion algorithm 

Given a decision network a node u (z V with Mu = {vi, . . . , 1"^}, and r £ Z+, introduce a vector 
CE[^, ti,r] = {CE[Q,u,r,x],x G x) G defined recursively as follows. 

1. CE[g,u,0,x] = 

2. For every r = 1, 2, . . ., and every x € 

d 

CE[g, u, r, x] = $„(x) - $„(0) + J2 

CE[g{u,j,x),Vj,r-l]y (9) 

where g{u, k, x) is defined in Subsection 14.21 and the sum Yl'j=i equal to when Mu = 0. Note 
that from the definition of Q{u,k,x), the definition and output of CE[^,ii, r] depend on the order 
in which the neighbors vj of u are considered. CE[^,m, r] serves as an r-step approximation, in 
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some appropriate sense to be explained later, of the cavity vector Bg^u- The motivation for this 
definition is relation ^ of Theorem [6l The local cavity approximation can be computed using an 
algorithm described below, which we call Cavity Expansion ( CE) algorithm. 



Cavity Expansion: CE[^,n, r, x] 
INPUT: A network Q , a node u In Q , an action x and a computation depth r > 
BEGIN 

If r = return 
else do 

Find neighbors N{u) = {vi^V2., ■ ■ ■ ^v^} of u In. Q . 

If M{u) = 0, return <I>„(a;) - $„(0). 

Else 

For each j = l,...,(i, construct the network Q{u,j,x). 

For each j = 1, . . . ,d, and y£x> compute CE[Q{u, j, x),Vj ,r — l,y] 

For each j = I, . . . ,d, compute i_tu^y.{x,CE[Q{u, j, x),Vj ,r — l,y]) 

Return ^uix) - ^ui^) +J2i<j<dl-''u^v,{x,CE[g{u,j,x),Vj,r - l,y]) as CE[g,u,r,x]. 



The algorithm above terminates because r decreases by one at each recursive call of the algo- 
rithm. As a result, an initial call to CE[^, u, r, rr] will result in a finite number of recursive calls to 
some CE[gj,Uj,kj,Xj], where kj < r. Let {Gi,Vi,Xi)i<i<rn be the subset of arguments for the calls 
used in computing CE[^,n, r, x] for which ki = 0. In the algorithm above, the values returned for 
r = are 0, but it can be generalized by choosing a value Ci for the call CE[^j, Vi,0, Xi]. 

The set of values C = {Ci)i<i<m will be called a boundary condition. We denote by CE[^, n, r, x, C] 
the output of the cavity algorithm with boundary condition C. The interpretation of CE[^, u,r,x, C] 
is that it is an estimate of the cavity Bg^u{x) via r steps of recursion ([2]) when the recursion is 
initialized by setting CE[^j, Uj, 0, Xj] = Ci and is run r steps. We will sometimes omit C from the 
notation when such specification is not necessary. Call C* = (C*) = {Bg^,vi{xi)) the "true bound- 
ary condition". The justification comes from the following proposition, the proof of which follows 
directly from Theorem [6l 

Proposition 2. Given node u and J\f(u) = {vi, . . . , v^}, suppose for every j = 1, . . . ,d and y £ x> 

CE[g{u,j,x),Vj,r- l,y] = 3,)^,,. (y); then, CE[g,u,r,x] = Bg,„(x). 

As a result, if C is the "correct" boundary condition, then CE[^, u, r, x, C] = Bg^u{x) for every 
u, r, X. The execution of the Cavity Expansion algorithm can be visualized as a computation on 
a tree, due to its recursive nature. This has some similarity with a computation tree associated 
with the performance of the Belief Propagation algorithm, |TJ02l ISSW071 IBSSOSj . The important 
difference with |TJ02| is that the presence of cycles is incorporated via the construction Q{u,j,x) 
(similarly to |Wei06l iJSOTl lBGK+07[ iGKOTal IGKOTbj . As a result, the computation tree of the CE 
is finite (though often extremely large), as opposed to the BP computation tree. 

An important lemma, which we will use frequently in the rest of the paper, states that in the 
computation tree of the cavity recursion, the cost function of an edge cost is statistically independent 
from the subtree below that edge. 
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Proposition 3. Given u,x and J\f{y) = {vi,...^Vd}, for every r,j = l,...,d and y G X; 

CE[Q{u,j,x),Vj,r — l,y] and ^u,Vj independent. 

Note however that ^u,Vj and CE\Q[u, k^x),Vk,r — l,y\ are generally dependent when j ^ k 

Proof. The proposition follows from the fact that for any j, the interaction function ^u,v does 
not appear in G{u,j,x), because node u does not belong to G{u,j,x)), and does not modify the 
potential functions of Q{u,j,x) in the step □ 

Our last proposition analyzes the complexity of running the Cavity Expansion algorithm. 

Proposition 4. For every Q,u,r,x, the value CE[Q,u,r,x] can be computed in time 0[r{ATY^ . 

Proof. The computation time required to construct the networks G{u,j,x), compute the messages 
liu^Vjix,By.), and return ^uix) - ^>m(0) + X]i<j<d ^t^J' is 0{AT). Let us prove by 
induction that that for any subnetwork G' of G, CFj[G' ,u,r, x] can be computed in time bounded 
by 0{r{ATY). The values for r = 1 can be computed in time bounded by M, since Q' is a subnet 
of Q and therefore of smaller size. For r > 1, the computations of CFj[Q' ,u,r,x] requires a fixed 
cost of 0{AT), as weh as (AT) cahs to CE with depth (r — 1). The total cost is therefore bounded 
by 0{AT + (AT) (r - l)(AT)'^-i), which is 0{r{ATY). □ 

5 Correlation decay and decentralized optimization 

In this section, we investigate the relations between the correlation decay phenomenon and the 
existence of near-optimal decentralized decisions. When a network exhibits the correlation decay 
property, the cavity functions of faraway nodes are weakly related, implying a weak dependence 
between their optimal decisions as well. Thus one can expect that good decentralized decisions 
exist. We will show that this is indeed the case. 

Definition 1. Given a function p{r) > 0, r G such that linv^oo pi^) = 0, a decision network Q 
is said to satisfy the correlation decay property with rate p if for every two boundary conditions C, 
C 

ma.x'E\CE[G,u,r,x,C] — CE[G,u,r,x,C'] \ < p{r). 

u,x 

If there exists Kc > and etc < 1 such that p{r) < KcO^ for all r, then we say that Q satisfies 
the exponential correlation decay property with rate Oc- 

The correlation decay property implies that for every u, x, 

E\CE{g,u,r,x]- Bg^u{x)\ < p{r). 

The following assumptions will be frequently used in future. 

Assumption 1. For all v G V,x,y G X; B^{x) — Bi,[y) is a continuous random variable with 
density bounded above by a constant g > 0. 

We will also assume the costs functions are bounded in L2 norm: 
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Assumption 2. There exists such that for any e £ E, ( E|"3>e(3;, < and for 

Assumption 1 is designed to lead to the following two properties: (a) There is a unique optimal 
action in every node with probability 1. (b) The suboptimality gap between the optimal action 
and the second best action is large enough so that there is a "clear winner" among actions. 

5.1 Correlation decay implies near-optimal decentralized decisions 

Under Assumption 1 let x = {x^)^^v be the unique (with probability one) optimal solution for the 
network Q. For every v G V , x £ X: — SLrgmax^CE[Q,v,r,x], ties broken arbitrarily, and 

x*" = {x"^). The main relation between correlation decay property, Cavity Expansion algorithm and 
the optimization problem is given by the following result. 

Proposition 5. Suppose Q exhibits the correlation decay property with rate p{r) and that Assump- 
tion m holds. Then, 



P« / Xu) < 2rV29p(r), yueV,r>l. (10) 
Proof. For simplicity, let Bl^{x) denote CE[Q,u,r,x]. We will first prove that 

F{xl^x^)<T\ge + ^) (11) 



The proposition will follow by choosing e = y^2p(r)g ^. Consider a node u, and notice that if 

{Bu{x) - B^{y)){Bl{x) - Bliy)) > 0, Vx / y, 

then xli^ = Xu- Indeed, since Bu{xu) — Bu{y) > for all y ^ Xu, the property implies the same for 
B^, and the assertion holds. Thus, the event {xj^ / x„} implies the event 

{3{x,y),y^ X : (i?„(x) - B^{y)){Bl{x) - Bl{y)) < 0} 

Fix e > and note that for two real numbers r and s, if |r| > e and \r — s\ < e, then rs > 0. Applying 
this to r = Bu{x) — Bu{y) and s = B^{x) — B^{y), we find that the events \Bu{x) — Bu{y)\ > e and 

i\Bu{x) - B:ix)\ < e/2) n i\Bu{y) - B:iy)\ < e/2) 

jointly imply 

{Bu{x) - Bu{y)){B:{x) - B:{y)) > 
Therefore, the event {Bu{x) — i?„(y))(i?^(x) — fi^(y)) < implies 

{\Bu{x) - Bu{y)\ < e} U {\Bu{x) - B:{x)\ > e/2} U {|i?„(y) - i?;(y)| > e/2} 

Applying the union bound, for any two actions x ^ y, 



{Bu{x) - Bu{y)){Bl{x) - Bliy)) < O) < P(|i?„(x) - i?„(y)| < e) + P(|i?„(x) - i?;(x)| > e/2) 

+ F{\Bu{y)-B:{y)\>e/2). (12) 
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Now F{\Bu{x) — Bu{y)\ < e) is at most 2ge by Assumption [TJ Using the Markov inequality, we 
find that the second summand in ()12p is at most 2K\Bu{x) — i?^(x)|/e < 2p{r)/e. The same bound 
applies to the third summand. Finally, noting there are T(T— l)/2 different pairs {x,y) with x ^ y 
and applying the union bound, we obtain: 

< (r(r-l)/2)(25e + 4p(r)/e) 

□ 

For the special case of exponential correlation decay, we obtain the following result, the proof 
of which immediately follows from Proposition O 

Corollary 1. Suppose Q exhibits the exponential correlation decay property with rate etc, and 
suppose Assumption CI holds. Then 

P« / Xu) < 2T2y2^a^/2, yu£V,r>l. 

In particular, for any e > 0, if 



then 

where K'^ = 2r2^2^ 

In summary, correlation decay - and in particular fast (i.e. exponential) correlation decay - 
implies that the optimal action in a node depends with high probability only on the structure of 
the network in a small radius around the node. As in |RR03j . we call such a property decentralization 
of optimal actions. Note that the radius required to achieve an e error does not depend on the size 
of the entire network; moreover, for exponential correlation decay, it grows only as a logarithm of 
the accepted error. 

The main caveat of Proposition [5] is that the Assumption [T] does not necessarily hold. For 
instance, it definitely does not apply to models with discrete random variables and ^u,v In 
fact, assumption [1] is not really necessary, as we show in an online appendix that a regularization 
technique allows to relax this assumption. Note that Assumption [2] is not needed for Proposition 
[5] to hold. 

5.2 Correlation decay and efficient decentralized optimization 

Proposition[5]illustrates how optimal actions are decentralized under the correlation decay property. 
In this section, we use this result to show that the resulting optimization algorithm is both near- 
optimal and computationally efficient. 

As before, let before x = {xu) denote the optimal solution for the network Q, and let x'' = (xj^) 
be the decisions resulting from the Cavity Expansion algorithm with depth r. Let x = (x^) 
denote (any) optimal solution for the perturbed network Q. Let Ki = 10K^T{\V\ + |-E|), and 
K2 = Ki (gKc)^^^, where Kc is defined under the assumption of exponential correlation decay. 



r > 2 



logK'^l + I loge| 
|log(ac)| 
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Theorem 7. Suppose a decision network Q satisfies correlation decay property with rate p{r). 
Then, for all r > 

E[F(x)-F(x'-'^)]<Ki(<7p(r))i/4 (13) 

Corollary 2. Suppose Q exhibits exponential correlation decay property with rate Uc- Then, for 
any e > 0, if 

r > (8| log e| + 4| log(K2)|) | log(aJ|-i 

then 

P(F(x) - F{^'-) > e) < e 
and can be computed in time polynomial in \ V\, 1/e. 

Proof. By applying the union bound on Proposition [Sj for every {u,v), we have: P((3;5^,x5^) ^ 
{xu,x^)) < 4rV29/3(r). We have 

E|F(x)-F(x^)| < ^E|$„(x„)-$„«)|+ muAxu,x„)-^uA<,0\ 

u€V (u,v)eE 

For any u,v £ V, 

<2K^F{{xl,xl) ^ {xu,x,)y^^ 
<AK^T{2gp{r)f/^ 

where the second inequality follows from Cauchy-Schwarz. Similarly, for any u we have 

n^uixu) - ^u{xl)\ <<4.K^T{2gp{r)f'^ 

By summing over all nodes and edges, we get: E[F(x) — F(x'') < 8-ftr$ T{2gp{r)Y^^ < Ki{gp{r)y^^ , 
and equation follows. The corollary is then proved using Markov Inequality; injecting the 
definition of exponential correlation decay into equation ([13]), we obtain 



P( Jg - F{x) > e) < E[Jg - F{x)]/e < K2a'^^/e 
Since r > (4| log(i<'2)| + 8| log(e)|)| log(a)|~-'^, we have ^20^/^ < and the result follows. 



□ 



6 Establishing the correlation decay property. Coupling tech- 
nique 

The previous section motivates the search for conditions implying the correlation decay property. 
This section is devoted to the study of a coupling argument which can be used to show that 
correlation decay holds. Results in this section are for the case \x\ = 2. They can be extended to 
the case \x\ > 2 at the expense of heavier notations, but not much additional insight gain. For this 
special case x = {0; 1}) we introduce a set of simplifying notations as follows. 
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6.1 Notations 

Given Q = {V, E, {0, 1}) and u , let vi, . . . ,Vd be the neighbors of u in V . For any r > and 
boundary conditions C, C, define: 

1. B{r) = CE[g, u, r, 1, C] and S'(r) = CE[g, u, r, 1, C] 

2. For j = 1, . . . d, let = g{u,j, 1), and let Bj{r - 1) = CE[gj-, Wj, r - 1, 1, C] and Sj(r - 1) = 
CE[Gj,Vj,r - 1, 1,C']. Also let B(r - 1) = (Sj(r - l))i<j<d and B'(r - 1) = {B'^{r - l))i<j<d 

3. For k = l,...nj, let [vji, . . . ,Vjnj) be the neighbors of in Qj, and let Bjk{r — 2) = 
CE[gj(uj,A;, l),Ujfc,r - 2, 1,C] and B'-,^{r - 2) = CE[gj(uj, fe, 1), Uj, r - 2,1,C'] for all k = 
I... rij. Also let Bj(r - 2) = (5,fc(r - 2))i<fc<„^. and B^(r - 2) = (Sj;^(r - 2))i<fc<,^. 



4. For simplicity, since 1 is the only action different from the reference action 0, we denote 

l-^u<—Vj (-2) — l-^u<~Vj (1) 

From equation ([1]), note the following alternative expression for fiu^y.{z) 



fiu^y^iz) = ^>^,,„^.(1,1) -$„,„^.(0, l)+max($„,„^(l,0) - (1, 1), z) (14) 

- max($„,^ (0, 0) - <^u,v, (0, 1), z) 



5. Similarly, for any j = l...d and k = l...nj, let iiy.^y.^^{z) = fiy^^y^^{l, z). 

6. For any z = (zi, . . . , Zd), let /x^,(z) = Yjj ^^u<^v,{zj). Also, for any j, and any z = (zi, . . . , z„,J, 

let //^^(z) = El<fc<n, ^^Vi^v,k{zk)■ 

7. For any directed edge e = (u <— v), denote 

= $„,,(1,0) 
$2 ^ $,,„(0,0) -$,,„(0,1) 
$3 ^ 1) -$„,„(0,1) 

Note that Y^^y = Yy^u, so we simply denote it Y^^y 
Note that for any e, E|ye| ^ (see Assumption 2). Equation (l9|) can be rewritten as 

B{r) = /x„(B(r - 1)) + $„(1) - $,,(0) (15) 
B'{r) = /i„(B'(r-l)) + $4l)-$JO) (16) 



Similarly, we have 



B,{r - 1) = /..,.(Bj(r - 2)) + $,^.(1) - $,^.(0) (17) 
i?j(r - 1) = /i„^,(B^(r - 2)) + «I>,„^.(1) - $,^.(0) (18) 
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Finally, equation (jl4p can be rewritten 



fiu^y{z) = <^'t^„ + max($i^„, z) - max(^>2^„, z) (19) 

Ye represents how strongly the interaction function <l>„^t,(xu, x„) is "couphng" the variables 
Xu and x„. In particular, if Yg is zero, the interaction function x^) can be decomposed 

into a sum of two potential functions ^u{xu) + ^v{xv)i that is, the edge between u and v is 
then be superfluous and can be removed. To see why this is the case, take $^(0) = 0, = 
$„,^(1,0) -$^.,„(0,0), $^,(0) = $«,„(0,0) and $^(1) = $„,^(0, 1), which is also equal to $„,^,(1,1)- 
0) + $„,^(0, 0), since Ye = 0. 

6.2 Distance-dependent coupling and correlation decay 

Definition 2. ^ network Q is said to exhibit (a, b)- coupling with parameters {a,b) if for every edge 
e = {u,v), and every two real values x, x' : 

¥(^fiu^^{oc + Ml) - MO)) = l^uMx' + Ml) - MO))) > (1 - a) - b\x - x'\ (20) 

The probability above, and hence the coupling parameters, depend on both ^t,(l) — $t,(0) and 
the values ^u,v{x,y). Note that if for all x,x' 

p(/i„^^(a;) = fiu^^ix')) > (1 - a) - b\x - x'\ (21) 

then Q exhibits (a, b) coupling, but in general the tightest coupling values found for equation ()2ip 
are much weaker than the ones we would find by analyzing condition (j20p . This form of distance 
dependent coupling is a useful tool in proving that correlation decay occurs, as illustrated by the 
following theorem: 

Theorem 8. Suppose Q exhibits [a ^b)- coupling. If 

a(A - 1) + y5K^(A - 1)3/2 ^ ^ (22) 
then the exponential correlation decay property holds with K = H^K^ and a = a(A— l) + \/6ir$(A — 

1)3/2. 

Suppose Q exhibits (a, b)- coupling and that there exists Ky > such that \Ye\ < Ky with probability 

a(A- l) + 6i^y(A- 1)2 < 1 (23) 
then the exponential correlation decay property holds with a = a(A — 1) + bKy{A — 1)^ 

6.2.1 Proof of Theorem [8] 

We begin by proving several useful lemmas. 

Lemma 1. For every {u,v), and every two real values x,x' 

lHu^vix) - Hu^vix')\ <\x - x'\. (24) 
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Proof. From (jl4p we obtain 

Hu^vix) - Hu^vix') = max (^^'„,^(1,0) - l),^;^ - max (^^u,v{0,0) - ^u,viO, 



- max (^^>„ .„(!, 0) - ^u,v{l, l),x'j + max (^$„,„(0, 0) - $„,„(0, l),x'j . 
Using twice the relation max^,. f{x) — m&Xxg{x) < maxa;(/(x) — g{x)), we obtain: 

fiu^v{x) — fiu^vix') < max(0, x — x') + max(0, x' — x) 
= \x — x'\ 

The other inequahty is proved similarly. □ 
Lemma 2. For every u,v £ V and every two real values x, x' 

\fj,u^v{x) - fiu^vix')\ < \Yu,v\ (25) 

Proof. Using (jl4p and (jl6p . we have 

fiu^vix) - 1) - ^'«,i,(0, 1)) = max($„,„(l, 0) - ^>„,„(1, 1), x) 

- max(^>„,^^(0,0) - «>„,^,(0, 

By using the relation max^: f{x) — max^; g{x) < maxx{f{x) —g{x)) on the right hand side, we obtain 

l^u^v (,X 

) - (^n,i)(l, 1) - ^n,i)(0, 1)) < max(0, -Yu,v)- 

Similarly 

{x') + ($n,i,(l, 1) - ^u,v{0, 1)) < max(0, 

Adding up 

l^u<—v{x) flui—vix ) ^ I^^jdI- 

The other inequality is also proven similarly. □ 
Lemma 3. Suppose {a, b)- coupling holds. Then, 

E\B{r) - B'{r)\ <a ^ E\Bj{r - 1) - Bj{r - l)\ + b ^ E[\Bj{r - I) - Bj{r - (26) 
Proof. Using ([9]), we obtain: 

E\B{r) - B'{r)\ = e[|$„(1) - $„(0) + ^ /x,^,^. (5^ (r - 1)) - ($„(1) - ^.„(0)) - ^ (5^ (r - 1))| 

j j 
< Y,E\fiu^,^{Bj{r - 1)) - fiu^,^{B'^{r - 1))| 
j 

= Y,^[E[\,,u^,^{Bjir - 1)) - fiu^.^{B'j{r - 1))| (Bj(r - 2), (B^(r - 2)] 
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By Lemma m we have \nu^vj{Bj{r - 1)) - /i„^„^(S^(r - 1))| < \Bj{r - 1) - Bj{r - 1)|. Also note 
from that from equation ([I7D and ([l8]), \Bj{r - 1) - B'j{r - 1)| = \fiv, (Bj(r - 2)) - fi^^ {B'.{r - 2))|; 
hence conditional on both /it,^. (Bj(r — 2) and ;Ut,^(Bj(r — 2), \Bj{r — 1) — Bj{r — 1)| is a constant. 
Therefore, 

E[\ti^^,^{B,{r - 1)) - Mn^.,(i?-(r - 1))| |/U,^.(Bj(r - 2), /i,^, (B^(r - 2) 

< \B,{r - 1) - i?j(r - 1)1 F{fiu^,^{Bj{r - 1)) / /i„^.,(i?j(r - 1)) | M.,.(Bj(r - 2),/i,,(B^(r - 2)) 

(27) 

Note that in the (a,b) coupling definition, the probability is over the values of the functions ^u,Vj , 
and By proposition^ these are independent from fijj. (Bj (r— 2) and fijj. (Bj (r— 2)). Thus, by the 
(a,b) coupling assumption, ¥{fiu^y.{Bj{r-l)) / /i^,^^,^. (B^(r - 1)) | (Bj(r - 2), (B^(r - 2)) < 
a + b\Bj{r - 1) - B'j{r - 1)|. The result then follows. □ 

Fix an arbitrary node u in Q. Let M{u) = {vi, . . . , Vd}- Let dj = \M{vj)\ — 1 be the number of 
neighbors of vj in Q other than n for j = 1, . . . , d. We need to establish that for every two boundary 
condition C, C 

E\CE{g, u, r, C) - CE{g, u, r, C')\ < Ka'' (28) 

We first establish the bound inductively for the case d < A — 1. Let Cd denote the supremum of 
the left-hand side of (j28p . where the supremum is over all networks Q' with degree at most A, such 
that the corresponding constant K^' < K^, over all nodes n in ^ with degree |AA(ii)| < A — 1 and 
all over all choices of boundary conditions C,C' . Each condition corresponds to a different recursive 
inequality for e^. 

Condition ([22]) Under ([22]), we claim that 

Cr < a(A - l)er_i + 6(A - lfK^er^2 (29) 
Applying (fT7|) and ([T8]) . we have 

\Bj{r - 1) - B'^ir - 1)| < |^,^_,^Ji?,fc(r - 2)) - fi,^^,jB'^j,{r - 2))| 

l<k<dj 

Thus, 



\B,{r - 1) - B'^ir - < ( J] (i?,fe(r - 2)) - (Sj,(r - 2))| 

l<k<dj 

<d, \^^v,^v,,{B,k{r-2))- ^,„^^,^^{B'^^{r-2))\ 



l<k<di 



By Lemmas [Hand [2] we have \n^^^^^^{Bjk{r -2)) - ^l^^^^^^{B'■^{r -2))\ < \Bjk{r - 2) - B'j^{r - 2)\ 
and \iiv.^v.^{Bjk{r - 2)) - n^.^^^^^lB'-^{r - 2))| < \Yjk\. Also, dj < A - l.Therefore, 

|i?,(r-l)-5^(r-l)|2<(A-l) J] \Bjk{r-2)-B'^,{r-2)\.\Y^,\ (30) 

l<fc<(i, 
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By Proposition [3l the random variables \Bjk{r — 2) — B'-^{r — 2)| and \Yjk\ are independent. We 
obtain: 

E|5,-(r-l)-i?j(r-l)|2<(A-l) E|i?,-,(r - 2) - i?;.,(r - 2)| . E|y,fc| (31) 

l<k<dj 

<{A-l)K^{ ^ K\B,k{r-2)-B'^,{r-2)\) 

l<k<dj 
<{A-lfK^er-2 

where the second inequahty follows from the definition of and the third inequality follows from 
the definition of Cr and the fact that the neighbors vjk, 1 < k < dj of vj have degrees at most A — 1 
in the corresponding networks for which Bj]^[r — 2) and B'-^{r — 2) were defined. Applying Lemma 
[3] and the definition of e^, we obtain 

¥.\B{r)- B'{r)\<a ^ ¥.\Bj{r - I) - B'j{r - l)\ + h ^ ¥.[\Bj{r - I) - B'j{r - 
< a(A - l)er-i + 6(A - lfK^er-2 

This implies (f29D . 

From (j29p we obtain that e,. < Ka^ for K = Air$ and a given as the largest in absolute value 
root of the quadratic equation = a(A — l)a + 6(A — 1)^K^. We find this root to be 

a = ^(a(A - 1) + ^a2(A-l)2 + 46(A-l)3i^$) 

< a(A - 1) + V6(A - 1)3^,^ 

< 1 

where the last inequality follows from assumption (j22p . This completes the proof for the case that 
the degree d of u is at most A — 1. 

Now suppose d = \J\f{u)\ = A. Applying ([T5|) and (fT6]) we have 

\B{r)-B\r)\< J] (5, (r - 1) - (i?;.(r - 1))| 

l<i<'i 

Applying again Lemma [H the right-hand side is at most 

\B,{r-l)-B'^{r-l)\ < Ae,_i 

i<i<d 

since Bj{r — 1) and B'-{r — 1) are defined for Vj in a subnetwork = Q{u,j, 1), where zij has degree 
at most A — 1. Thus again the correlation decay property holds for u with AK replacing K. 

Condition (j23p Recall from lemma [3] that for all r, we have: 

E\B{r) - B'{r)\ < a ^ E\Bj{r - 1) - Bj{r - 1)\ + b ^ E[\Bj{r - I) - Bj{r - 

l<j<d l<j<d 
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For all J, \Bj{r - 1) - B'j{r - 1)| = | Ylkil^v,'^v,^:{Bjk) - fivj^v,t:{Bj^.))\. Moreover, for each j,k, 
\fj,y.^-y.^{Bjk) — Hvj^Vjk{^jk)\ — \^jk\ 1^ Ky (the second inequality follows from Lemma[2l the third 
by assumption). As a result, 

\Bj{r - 1) - S;.(r - 1)|2 < (A - l)Ky|B,(r - 1) - B,{r - 1)| 

We obtain: 

er<{a + bKriA - 1)) (A - l)er-i 

Since o(A - 1) + 6ify(A - l)^ < 1, e,. goes to zero exponentially fast. The same reasoning as 
previously shows that this property implies correlation decay. 

6.3 Establishing coupling bounds 
6.3.1 Coupling Lemma 

Theorem [8] details sufficient condition under which the distance-dependent coupling induces corre- 
lation decay (and thus efficient decentralized algorithms, vis-a-vis Proposition H] and Theorem [7]). 
It remains to show how can we prove coupling bounds. The following simple observation can be 
used to achieve this goal. 

For any edge {u,v) G Q, and any two real numbers x,x', consider the following events 

E+^,{x,x') = {mm{x,x') + $,(1) - $,(0) > max($i^„ 

E-^y{x,x') = {max{x,x') + - <P,,{0) < min($i^„, ^-^^ J} 

Bu^v (x, X ) — -^u,D {x, X ^ U Ey^^y (x, X ) 

Lemma 4. // Eu^v{x,x') occurs, then fiu^v{x + ^v{^) — *^i'(0)) = fiu^v{x' + ^v{^) — ^v{0)). 
Therefore 

P{fiu^y{x + $„(1) - <^y{0)) = fiu^y{x' + $„(1) - $„(0)) > P{Ey^y{x,x')) 

Proof. From representation (fT9]) . we have fiu^v{x) = ^t^v + max(<l>^^„, z) — max(<&^^^, z); let 
x,x' be any two reals. If both x and x' are greater than both and Phi^^y, then ^„^„(x) = 

^t^v — fJ-u^vix')- If both X and x' are smaller than both and Phi^^y, then ^^^^{x) = 

^t^v + ^\'^v ~ ^'u'^v — l-i-u^vix'). The result follows from applying the above observation to 
X + $„(!) - ^'„(0) and x' + ^^{1) - ^>„(0). □ 

Note that Lemma H] implies that the probability of coupling not occuring P{fiu^v{x + <I>^,(1) — 
$1,(0)) / ^u^v{x' + ^v{'^) - <^'d(0))) is upper bounded by the probability of {Eu^v{x,x')Y . When 
obvious from context, we drop the subscript u <^ v. We will often use the following description of 
{E(^x, x')y: for two real values x > x' , 

{E{x, x')y = {min($\ ^ ^^^q) _ $^(i) < a; < max(^>\ ^ ^^^q) - ^>„(1) + x - x'} (32) 
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6.3.2 Uniform Distribution. Proof of Theorem [T] 

In order to prove Theorem [H we compute the coupUng parameters a,b for this distribution and 
apply the second form of Theorem [HI 

Lemma 5. The network with uniformly distributed rewards described in section [3A\ exhibits (a, 6) 
coupling with a = and b = 

Proof. For any fixed edge (n, v) € Q, ^^^y and ^u^v i-i-d. random variables with a triangular 
distribution with support [—2/2, 2/2] and mode 0. Because ^^^^ and ^u^^ are i.i.d., by symmetry 
we obtain: 

F{{E{x,x')y) = 



r2l2 




(«i) 


r2l2 


'-2I2 






J ai 


r2l2 


dF^i 


(«i) 


r-2l2 


1-212 






J ai 



P{x' - 02 < <^«(0) - <x - ai) can be upper bounded by and we obtain: 

P{E{x,x'f) < ^—^ + 1 dP$i(ai) cffi»$2(a2)(a2-ai) 



2/1 

]^ r2l2 j-2l2 

- / dP$i(ai) / 

'1 -'I 7-2/2 -'ai 

Note that dF^2{a2) = ■^{a2 + 2l2)d{a2) for 02 < 0, and dF^2{a2) = 477(2/2 - 02)^(02) for 
02 ^ 0; identical expressions hold for dF^i{ai). Therefore, for ai > 0, 

r-2/2 1 [-212 



/ dP$2(a2)(a2 - Oi) = — 2 / (2/2 - 02)(02 - Ol) d(o2) 
J ai 4/2 Jai 

/■2/2 /•2/2 

/ (2/2 - 02)2^(02) + (2/2 - 01) / (2/2 - 02)^(02)) 



All 



Similarly, for ai < 0, 



/2/2 ]^ 
dP$2(a2)(a2 - oi) = -ai + ^^(oi + 2/2)= 



'ai ^^-^2 

The final integral is therefore equal to: 

/2/2 /■2/2 
dP$i(ai) / dP$2(a2)(a2 -oi) 
-2/2 -'ai 



1 , /"O 1 /■2/2 1 

47^( J.2I ((«i + 2^2)(-ai + ^(oi + 2/2)3)d(ai) + _^(2/2 - ai)M«i) 



'2 ^7-2/2 

1 3 4 3X _ 7 
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Finally, 



i^rs. \x — x'\ lo \x — x'\ 

Therefore, the system exhibits coupling with parameters (^, 277)- '— ' 

We can now finish the proof of Theorem [TJ For all (u, u) G E and x,y £ \^u,vixiy)\ < 
Therefore, for any {u,v), |y„,„| = \^u,v{l,l) - $«,„(0, 1) 0) + $„,„(0, 0)| < Ah- 

Note that for all edges, \Ye\ < AI2, so that the condition /3(A - 1)^ < 1 implies ^(A - 1) + 
g(A-l)2 < 1. Since (A-1) < (A-l)2, if/?(A-l)2 < 1 we also have ^(A-l) + g(A-l)2 < 1. 
This is exactly condition (|23p with a, b as given by LemmaOand Ky = 4/2. It follows that Q exhibits 
exponential correlation decay, and since Assumptions 1 and 2 hold, all conditions of Corollary [2] 
are satisfied, and there exists an additive FPTAS for computing Jg. 



6.3.3 Gaussian distribution. Proof of Theorem [2] 

In this section, we compute the coupling parameters for Gaussian distributed reward functions. 
Rather than considering only the assumptions of Theorem [2l we place ourselves in a more general 
framework. The proof will then follow from the application of Theorem [8] (first condition) and a 
special case of the computation detailed below (see Corollary [3|) . Assume that for every edge e = 
{u,v) the value functions ($^(^^(0, 0), <I>„^t,(0, 1), <I>u^i,(l, 0), <^ii^.u(l, 1)) are independent, identically 
distributed four-dimensional Gaussian random variables, with mean ^ = (/^i)«G{oo,oi,io,ii}) ^-i^d 
covariance matrix S = (•S'jj) jjg|oo,oi, 10,11} ■ 1^°^ every node v E F, suppose $«(!) = and that 
$1,(0) is a Gaussian random variable with mean jip and standard deviation dp. Moreover, suppose 
all the and $e are independent for v G V , e G E. Let 

f 1 = 5*10,10 — 25io,ii + + (7p (72 =5*00,00 — 25oo,oi + 5oi,oi + 



(cri(T2) ^(5*00,10 — 5*00,11 — 5*01,10 + 5oi, 11 + fJp) C 



fJn — (Ti 



2 2 2 2 2 2 

Proposition 6. Assume C < 1. Then the network exhibits coupling with parameters {a,b) equal 
to: 



1 ^ / / 1 cry\ , /2 /Uoo + ^11 -^10 -m 

■ arctan \ / —77 + 



vr V V 1 — ax ) V vr ax 

V vrcjx 

Corollary 3. Suppose i/iai /or eac/i e,($e(0, 0), $e(0, 1), $6(1, 0), <I>e(l> 1)) are i.i.d. Gaussian vari- 
ables with mean and standard deviation cTg. Let j3 = \ —rf-^ Then a < j3 and bK^ < (3. 
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Proof. Under the conditions of corollary [3l we have ay = 4crg, o"^ = 4(7^ + 4(Tg, and C = 0. Note 
also that < 2(Te By Proposition [U the network exhibits coupling with parameters 



1 / ^ 1 

arctan I ^ / — - — ^ ) < —(3 < fi 




and so, < \l-l3 < /3 



□ 



Remark that when cje — >■ 0, /3 — > and correlation decay takes place; moreover, combining 
Corollary [3] and Theorem [8] (condition (j22p ) directly yields Theorem [2l 



Proof of Proposition\^ . Fix an edge {u,v) in E] for simplicity, in the rest of this section denote 
$1 = $1^^ + $„(0) - and $2 = $2^^ + $^,(0) _ It follows that follows a 

bivariate Gaussian distribution with mean (/xi,/i2): 



/"i = Mio - Mil + Mp and /i2 = /ioo - /^oi + 

and covariance matrix 

Sa={ ^T' 

\ 2 2 1 

Let X = +<I>,y = <I> — Then, (X, y) is a bivariate Gaussian vector with means E[X] = 
Ml + M2 and E[y] = /i2 — Mi, standard deviations fTx,o"y and correlation C as defined previously. 
Denote also X = X - E[X] B.ndY = Y -E\Y] the centered versions of X and Y . Consider two 
real numbers x > x', and let (6, t) be the two real numbers such that x = h + t/2, x' = b — t/2. 
From equation ([32]) . we have 

{E{x,x')y = {min(¥\¥^) -t/2<b< max(¥\¥^) + t/2} 

The first step of the proof consists in rewriting the event {E(x, x')Y in terms of the variables X, Y: 
Lemma 6. 

{E{x,x')Y = {\Y\ > \X -2b\ -t} 

Proof. 

{E{x,x)y ={min(¥\¥^) -t/2<b< max(¥\¥^) + t/2} 

={¥^ - t/2 < 6 < ¥^ + t/2,¥^ <¥^} u {¥^-t/2<&<¥^+t/2,y <0,¥^<¥^} 
={2¥^ - t < 26 < 2$^ t,¥^ < ¥^} U {2¥^ - t < 26 < 2¥^ + 1,¥^ < ¥^} 

={x -Y -t <2b < X + Y + t,Y >o} u {x + y- t< 26 <x-y + t,y<o} 

={{X - 26) - |y| - t < < (X - 26) + \Y\ + 1} 

={|>'| > (X - 26 - 1)} n {|y| > (26 - X - 1)} 

={\Y\ > \X -2b\ -t} 

□ 
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For any b and t > 0, let S{t) = {x, y : \y\ > \x\ — t}, and for any real x, let S(t, y) = {x : \y\ > 
\x\ — t}. Note S{t,y) is symmetric and convex in x for all y. Using the lemma, we obtain: 

2TT(Txay^/l-C^ Js{t) 2(1-6^) 

= 7, \, / exp(- . ^ r<2\ — — ^^"^^^^ )9{y)dy (33) 

2TTaxO-yVl - Jy 2(1-6^) cj^ 

where: 

/ ^ /■ / 1 (x - /xi - ^2 + 26)2 ^^(a;-/ii-M2 + 26)(?/-^2 + w)NN , 
g{y)= exp(-— — -2r( 2 2C ))dx 



Let Xfe = y-'^^-t^^^-"' and y = iO^. Then 

12 



/■ 1 

g{y) = exp(— pr^y^) / e^P(-TT7l F^^^^'' ~ Cyf)dx 



Now, Xb — Cy = — . Recall Anderson's inequality |Dud99| : let 7 be a centered 

Gaussian measure on R'^, and S be a convex, symmetric subset of M'^. Then, for all z, 7(5") > 
7(5 + z). Since S{t,y) is a convex symmetric subset, by setting 2b = fii + fi2 + ^ it 

follows that 

f 1 

g{y) < exp(— r^f) / exp(-— 2- r^x^)dx 

2(1-6^) Jxes{t,y) 20-^(1-6^) 

Injecting that bound in equation ([33]) . we obtain: 

F{{Ey{x,x')) < 



1 (y-/X2+Ml)^ 

27ra.ayVr^ Jy^^'^' 2(1 - C^) 



'x<JyV -L ^ Jy ^ I "y 



C {y-fl2 + f^l)\ f , 1 2^ 



^^p(2(r3z^ — — ^Lsi,yr^-m^f ^'^r' 

^™ / exp(-,,^(^ + (1 - C^)^-^i^J^^±^))dxdy 



'27ra^ayVT^ Js{t) 2(1-C2)V2 
he tri; 

\x\ — t — \a\}. We obtain: 



a^y 



Finally, note that the triangular inequality, for any a we have S{t) C Sa{t) = {{x,y) : |y — a| > 



n{Enx,x')) <- ^-== [ exp(- ^ (^ + (l-C') ^y ^'^'''^' ))dxdy 

< ^-y== [ exp(-— ^(4 + (1 - C')y^))dxdy 

2ira^ayVl - ^5(4+1^2-^1 1) 2(1 - C^) ai 
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where the second inequahty follows from a simple change of variable. Let t' = t + 1/^2 — Finally, 
we decompose S{t') as the union of two sets: S{t) = Si^tiT) U Soutit), where: 

5int(t') ={{X,Y) : \X\ <t'} 

Soutit') ={iX,Y) : \X\ > t' and \Y\ > {\X\-t')}, 
and note that Sint{t') n Sont{t') = 0. We have: 

2t' 



P(5int(t')) <- 



^27r(l - 

and, by symmetry of S'out(i') in ^ and Y, 

F(5out(i')) =4P({(x, y) : X > t, y > 0, y > X - t}) 



J|(x,-(/):a;>t,v>0,v>a:-t! ~ '-^ J '^w 



TTfJajfTyVl — J {ix,y):x>t,y>0,y>x-t} ^1-^ — a" a" 

Using the change of variables {x' ,y') = [--j=== — , ^),we get: 

2 /■ f , , , t' 



Since ix' + ,^ )^ > x'^, it follows that: 

nSont{t'))<- [ ^ J fexp(-x'2-y'2)^rf^^y 

^ J{{x',y'):x'>0,y'>0,y'> x'} ^ ^ 

By using a radial change of variables {x\y') = (r cos(^), r sin(0)) we can compute exactly the 
expression above, and find: 

nSont{t')) <- [ , . exp(-r2)rfir(i0 

J{{r,6»):r>0,arctan(^2LviL:^)<6l<|} 

1 

= — arctan( 



which gives us the desired bounds on (a, 6). □ 

7 Maximum weighted independent sets 
7.1 Cavity expansion and the algorithm 

In this section, we show how the correlation decay framework also applies to MWIS problems 
and prove theorems 3, 4, and 5. There are additional challenges in achieving this goal. First, 
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the bounded costs assumption required for the results of section [5] does not hold for constrained 
optimization problems, as the underlying problem has infinite costs. Second, the coupling technique 
of section [6] is not readily applicable for MWIS. We therefore develop a different approach. 

As for unconstrained optimization problems, we follow three steps. First, we detail the Cavity 
Expansion algorithm. Second, we establish the correlation decay property. Finally, we show that the 
correlation decay property implies that near-optimal, decentralized optimization can be performed 
in polynomial time. 

Consider a general weighted graph G = {V,E,W), where {V,E) is a graph whose nodes are 
equipped with arbitrary non- negative weights Wi,i G V; no probabilistic assumption on Wi is 
adopted yet. Note that for Independent Sets problems, we have Jg = W{I*), and for any (ii , . . . , i^), 
Jg,{ii,...,ia)i^) = Jg\{iu...,ia}^ where id} is the subgraph induced by nodes V\{ii, . . .,id}. 

Consider a given node i G V and let N{i) = {ii, . . . , id}- From Theorem [6l we have 

Bg,^ = - JgM = + Y,^^^^H{hBg^u),J (35) 

I 

Recall that for MWIS, we have ^e{x,y) = — oo for {x,y) = (1,1) and ^e{x,y) = 0, otherwise. 
Therefore, by definition of Hi^ii, we have 

m^i, (1, Bg(^ij)^i^ ) = max(-00 + Bg(^^)^i^ , 0) - max(Bg(i^;)^ij , 0) = - max(Bg(i^;)^i, , 0) 

Thus, 

d 

Bg,i = Wi - ^max(5g(j_i)^i^,0) 
1=1 

Let I < d; recall the definition of G{i,l): G{i,l) is the network G \ {i}, where the potential 
functions of the neighbors of i have been modified as follows: 

• foTve {ii, . . . ^'M = MO) + 0^,^>(1,O) = 0, and 0;,(1) = W^ + 1) = W„ - oo = 
— oo. Since the new weight of v is — oo, it is equivalent to removing this node from the graph. 

• for V e {ii+u . . .,id,MO) = MO) + MiO,0) = 0, and M^) = W^, + 0^,^(0, 1) = 

We thus observe that in G{i, I), the nodes {i,ii, . . . , ii-i} can be removed, while the weights of 
nodes {i^+i, . . . , id} are unchanged; equivalently, we have G{i,l) = Q \ {i,ii, . . . , ii-i}- Therefore, 
we obtain 

d 

Bg,i = Wi - ^max(Sg\|i ij ... i^_j,0) 
1=1 

We further modify the cavity recursion by the following change of variable: for any graph Q and 
node i, let Cg{i) = max(i?g^j, 0); note we have Cg{i) = max(Jg^j(l), Jg,i{0)) — Jg,i{0) = Jg — Jg\{i}- 
The variables C will be called cavities. It turns out that in the case of IS problems, working with 
cavities C is more convenient than with cavities B. We obtain the cavity recursion for MWIS: 

Proposition 7. For any i £V, let N{i) = {ii, . . . ,id}- Then 

Cg{i) = inax(o,Wi- Cg\{i,iu-,ii-i}(^i)) ^ (36) 

l<l<d 
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where Ei<Kd = when N{i) = 0. IfWi - Ei<z<d > 0, namely Cg{i) > 0, then 

every largest weight independent set must contain i. Similarly ^/W/'i — X^KKd ^5\{j,n,...,ii-i}(^0 < 0' 
implying Cg{i) = 0, i/ien every largest weight independent set does not contain i. 

Remark : The proposition leaves out a "fuzzy" case Wi — Z^kkcZ .■■,«;-!} (^') ~ This 

wiU not be a problem in our setting since, due to the continuity of the weight distribution, the 
probability of this event is zero. Modulo this tie, the event Cg{i) > {Cg{i) = 0) determine 
whether i must (must not) belong to the largest weighted independent set. 

Using the special form of the cavity recursion (136p . the cavity expansion algorithm for MWIS 
is very similar as the one defined in section 14.31 For any induced subgraph 7i of Q and node i, 
let C^{i,r) = max(0, CE['H, i, r]) with boundary condition CE[7{,i,0] = 0, and let C^{i,r) be the 
same quantity for the boundary condition CE['H,i,0] = Wi. Alternatively, C~ and C+ can be 
defined by the following recursions: 

_ r 0, r = 0; 

^ { Wi, r = 0; 

^^^'''^ = 1 max(0,T^.-Ei<K.C^U»,...,,_a(^^ r > 1. ^^^^ 

The two boundaries condition were chosen so that C^{i,r) and C^{i,r) provide valid bounds 
on the true cavities C-H^i), as detailed by the following Lemma. 

Lemma 7. For every even r 

C^{i,r)<Cn{i)<C+{i,r), 

and for every odd r 

C+(i,r)<C7,(i)<C^(i,r), 

Proof. The proof is by induction in r. The assertion holds by definition of C~ , for t = 0. The 
induction follows from (j36p . definitions of C^,C^ and since the function x max{0,W — x) is 
non-increasing. □ 

We now describe our algorithm for producing a large weighted independent set. Our algorithm 
runs in two stages. Fix e > 0. In the first stage we take an input graph Q = (V, E) and delete 
every node (and incident edges) with probability e^/16, independently for all nodes. We denote the 
resulting (random) subgraph by ^(e). In the second stage we compute Cg^^-^{i,r) for every node i 

for the graph Q{e) for some target even number of steps r. We set I{r,e) = {i : Cg^^-^{i,r) > 0}. 
Let /* be the largest weighted independent set of G{e). 

Lemma 8. 2{r, e) is an independent set. 

Proof. By Lemma [71 if Cgf^^-^{i,r) > then Cg(^^-^ > 0, and therefore I C I* . Thus our algorithm 
produces an independent set in Q{e) and therefore in Q. □ 

We finish this section by mentioning that due to Proposition [4l the complexity of running both 
stages of the algorithm is 0{nrA^). As it will be apparent from the analysis, we could take C'gj-^) 

instead of CZ, . and arrive at the same result using an odd number r. 
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7.2 Proof of Theorem d 



7.2.1 Correlation decay property 

The main bulk of the proof of Theorem [3] will be to show that X(r, e) is close to /* in the set-theoretic 
sense. We will use this to show that W{I{r,€)) is close to W{I*). It will be then straightforward 
to show that W{I*) is close to W{I*), which will finally give us the desired result, theorem 5. The 
key step therefore consists in proving that the correlation decay property holds. It is the object of 
our next proposition. 

First, we introduce for any arbitrary induced subgraph H of ^(e) , and any node i in 7i, introduce 
M7.^(i) =E[exp(-C^,(i))],M^(i,r) =E[exp(-C^(i,r))],M+(i,r) =E[exp(-(:7+(i,r))]. 

Proposition 8. Let G{e) = (VejE^) be the graph obtained from the original underlying graph as 
a result of the first phase of the algorithm (namely deleting every node with probability 5 = 
independently for all nodes). Then, for every node i in G{e) and every r 

nCgie){i) = 0,C+^)(i,2t) > 0) < 3(1 - e^Wf"-, (39) 

and 

nCg(e){i) > 0,C7-(^)(i,2t) = 0) < 3(1 - eVie)^^ (40) 

Proof. Consider a subgraph 7i of G, node i £ TC with neighbors Mnii) = {^i, • • • > id}-, and suppose 
for now that the number of neighbors of i in ^ is less than 2. 

Examine the recursion ()36p and observe that all the randomness in terms C-^^\|j j^^,,,^jj_j}(i/) 
comes from the subgraph 7^ \ {z, ii, . . . , and thus Wj is independent from the vector 
(C'H\{i,n,...,ii_i}(^0' 1 1^ l < d). A similar assertion applies when we replace C-^\{i,ii,...,i;_j}(««) with 
^'^'^ ^""^ ^'^^^'y Using the memoryless property of the 

exponential distribution, denoting W a standard exponential random variable, we obtain: 

E[exp(-C^,(i))| ^ Cn\{i,i,,...,i,_,}{ii) = x] =F{W, < x)E[eMO)]+ 

l<l<d 

E[exp(-(VFi - x)) I Wi > x]F{Wi > x) 
= (1 - F{W^ > x)) +E[exp{-W)]¥{Wi > x) 
= (1 - F{W^ > x)) + (l/2)P(Wi > x) 
=1- (l/2)P(Wi >x) 

=1 - (l/2)exp(-x) (41) 

It follows 

E[eM-Cnm = 1 - (l/2)Eexp I - J] Cn\{i,i,,...,^,_,}{^l) I • 

\ l<l<d J 

Similarly we obtain 

E[exp(-C^(i,r))] = l-(l/2)Eexp(- J] ^ " 1)) ' 

l<l<d 

E[exp(-C+(i,r))] = l-(l/2)Eexp(- J] C+^ii,i,,...,,_,}iii,r - l)) . 

l<l<d 
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Since i had two neighbors or less in Q, it also has two neighbors or less in Tl. For d = 0, we 
have trivially M')-c{i) = M^{i) = M^{i). Suppose d = \ : Nn{i) = {ii}. Then, 



M^(i,r) - M+(i,r) = (1/2) (^E[exp(-C+^^^^ (n, r - 1))] - E[exp(-C-^^^^(ii, r - 1))]^ 

= (l/2)(M+^^^^(n,r - 1) - M-^^^^{h,r- 1)) (42) 

Finally, suppose d = 2: N{i) = {11,12}- Then 

M^{i,r)- M+{i,r) 

= (l/2)E[exp(-C+^^,^^(n,r - 1) - C+^^,^^^^^{i^,r - 1))] 

- (l/2)E[exp(-C^^^^^(n,r - 1) - C^^^^^^^^(i^,r - 1))] 

= (l/2)E[exp(-C+^^,^^(ii,r - l))(exp(-C7+^^^_^^^(i2, r - 1)) - exp(-C-^^,^^^^^(i2, r - 1))] 
+ (l/2)E[exp(-C^^^^^^^^(i2,r- l))(exp(-C+^^,^(ii,r- 1)) -exp(-C^^^^j(ii,r- 1)^ 

Using the non- negativity of C~ , and applying Lemma [7] we obtain for odd r 

< M+(f,r) - M^{i,r) < (l/2)E[exp(-C-^^^_^^^(i2, r - 1)) - exp(-C+^^^^^^^(z2, r - 1))] 

+ (l/2)E[exp(-C^^^^j(n,r - 1)) - exp(-C+^^^^(n, r - 1))], 
= im{M^\{u,y{^2,r - 1) - M+^^^_^^^(f2,r - 1)) 
+ (l/2)(M^^^^^(ii,r - 1) - M+^^^^(ii,r - 1)) (43) 

and for even r 

< M^(i,r) - M+(i,r) < (1/2)(M+^^. .^^(i2,r - 1) - M"^^. .^^(^2, r - 1)) 

+ (l/2)(M+^^^j(n,r - 1)] - M-^^^^ih,r- 1)) (44) 
Summarizing the three cases we conclude 

\M+{i, r) - M-{i, r)\ < {d/2) max M+ (j, r - 1) - M-,(i, r - 1) , (45) 

where the maximum is over subgraphs 7i' of Q and nodes j G with degree at most 2 in "H'. The 
reason for this is that in equations (j42|) . (|43p . and (j44|) : the moments M^, (j, r — 1) in the right hand 
side are always computed in a node j which has lost at least one of its neighbors (namely, i) in 
graph 7i. Since the degree of j was at most 3 in ^ and one neighbor at least is removed, j has at 
most two neighbors in 7i. By considering 7ir\Q{e) in all previous equations, equation (j45p implies 

l^^ng(.)(^'0 - M-^^(^)(i,r)| < (d(e)/2)nmx |m+ ^^(^^(j, r - 1) - M-,^Q^^p,r - 1)|, (46) 

where d{e) denotes the number of neighbors of i in TL f\Q{e). By definition of ^(e), d{e) is a 
binomial random variable with d trials and probability of success (1 — e^/16), where d is the degree 
of i in Ti. Since d <2, E[d{e)] < 2(1 — e^/16). Moreover, this randomness is independent from the 
randomness of the random weights of 7i. Therefore, 



K[I^Wng(e)(^0 - M-^5(^)(i,r)|] < (1 - eVl6)rnaxE Af+^^(^)(j, r - 1) - M-^g^^^{j,r - 1) 



(47) 
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where the external expectation is w.r.t. the randomness of the first phase of the algorithm (deleted 
nodes). Let e^-i denote the right-hand side of (I47p . By taking the max of the left-hand side 
of (j47p over all iJ~L,j) where j has degree less than or equal to 2 in Ti, we obtain the inequality 
Cr ^ (1 — e^/16)er._i. Iterating on r and using < Af < 1, this implies that e,. < (1 — for 
all r > 0. Finally, it is easy to show using the same techniques that equation (j45p holds for r = 3 
as well. This finally implies that for an arbitrary node i in Q{e) 

E[|M+^)(f,r) - M-^)(i,r)|] < 3/2(1 - e^/mf. 

Applying Lemma [TJ we conclude for every r 

< E[exp(-Cg(^)(i,2r)) - exp(-C+^)(i, 2r))] < 3/2(1 - e^/lQf\ 

Recalling (|4ip we have 

E[exp(-Cg(,)(i))] = 1 - {l/2)nW > Cg^e)\{^,n,...,n_,}iil)) = 1 - (l/2)P(Cg(,)(i) > 0). 

l<l<d 

Similar expressions are valid for Cg^^^{i,r)),Cg^^^{i,r)). We obtain 

< F(C+^)(i,2r) = 0) -P(C-^)(i,2r) = 0) < 3(1 - e^wr. 
Again applying Lemma [71 we obtain 

]P(Cg(e)« = 0,C+^)(^,2r) > 0) < P(C-(^)(z,2r) = 0,C+^)(i,2r) > 0) < 3(1 - e^/Wf', 

and 

P(Cg(,)(z) > 0,Cg(^)(z,2r) = 0) < P(C-(^)(z,2r) = 0,C+^)(z,2r) > 0) < 3(1 - 6^16^. 
This completes the proof of the proposition. □ 

7.2.2 Concentration argument 

We can now complete the proof of Theorem[3l We need to bound \W{I*)—W{I*)\ and W{I*\I{r, e)) 
and show that both quantities are small. 

Let AVe be the set of nodes in G which are not in g{e). Trivially, \W{r) - W{I*)\ < W{AVe). 
We have E[AV^] = e^/16n, and since the nodes were deleted irrespectively of their weights, then 
E[W{AV,)] = eVl6n. 

To analyze W{I* \T{r,e)), observe that by (second part of) Proposition [HI for every node 

G /* \ J(r, e)) < 3(1 - e'^/16Y = 6i. Thus E|/* \ J(r, e)| < 6in. In order to obtain a bound on 
W{I* \1{r, e)) we derive a crude bound on the largest weight of a subset with cardinality 5in. Fix 
a constant C and consider the set Vc of all nodes in Q{e) with weights greater than C. We have 
E[Ty(Vb)] < (C + E[W -C\W > C]) exp(-C)n = (C + 1) exp(-C)n. The remaining nodes have 
a weight at most C. Therefore, 

E[W(/: \ J(r,e))] < E[iy(((/: \ J(r,e)) n Vc)^VS)] < CE[|/: \ J(r, e)|] + E[Vc] 
< C6in +{C + l) exp(-C)n. 
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We conclude 

E[\W{r) - W{I{r, e))|] < e^/16n + C6in + (C + 1) exp(-C)n. (48) 

Now we obtain a lower bound on W{I*). Consider the standard greedy algorithm for generating an 
independent set: take arbitrary node, remove neighbors, and repeat. It is well known and simple to 
see that this algorithm produces an independent set with cardinality at least n/4, since the largest 
degree is at most 3. Since the algorithm ignores the weights, then also the expected weight of this 
set is at least n/4. The variance of that weight is upper bounded by n. By Chebyshev's inequality 

71 

F{W{n < n/8) < —- —2 = 64/n. 

(n/4 — n/8j^ 

We now summarize the results. 

^ ^wti')^ < 1 - < ^( ^^(p)^^ < 1 - ^' ^(^*) > + nW{n < n/8) 

^^( '^^'^^^(Tr"^^' ^--^(n^n/8)+64/n 

<F('^(^^);];(^(-'^»'>.) + 64/n 

^ 6Vl6 + 4C(l-eVl6r + (C + l)exp(-C) , , 

- e/8 + / ' 

where we have used Markov's inequality in the last step and 5i = 4(1 — Thus it suffices to 

arrange C so that the first ratio is at most 2e/3 and assuming, without the loss of generality, that 
n > 192/e, we will obtain that the sum is at most e. It is a simple exercise to show that by taking 
r = 0(log(l/e)/e^) and C = 0(log(l/e)), we obtain the required result. This completes the proof 
of Theorem [3l □ 



7.3 Generalization to higher degrees. Proof of Theorem [4] 

In this section we present the proof of Theorem [H The mixture of A exponential distributions with 
rates aj,! < j < A and equal weights 1/ A can be viewed as first randomly generating a rate a 
with the probability law P(a = aj) = 1/A and then randomly generating exponentially distributed 
random variable with rate Oj, conditional on the rate being aj. 

For every subgraph TL of Q, node i in Ti and j = 1, . . . , A, define Mij^{i) = E[exp(— C7^(i))], 
M~'^{i,r) = E[exp(-aj C^(i,r))] and M:^'\i,r) = E[exp(-aj C+(i,r))], where C-H(i)), C+(i, r)) 
and Cj^{i,r)) are defined as in Section EH 

Lemma 9. Fix any subgraph 7i, node i with N'}^{i) = {ii,. . . ,id}- Then 

^ "fcC'w\{i,n,...,i,_i}(«/))] 

l<l<d 
l<l<d 
l<l<d 



E[exp(-a,-C^(i))] 
E[exp(-a,-C+(z,r))] 
E[exp(-a,C7^(z,r))] 



1 V- 

1-A E 



l<fe<m 



E[exp(- 



1-1 V ^^E[exp( 



l<k<m 



l<k<m ■' 



E[exp(- 
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Proof. Let a{i) be the random rate associated with node i. Namely, ¥{a{i) = Oj) = 1/A. We 
condition on the event X^KKd ^W\{i,n,...,ii_i}('^«) = ^- C-}i{i) = max(0, Wj — x), we obtain: 



W^-ajC-H{i)\x\ = ^'^'&[-ajC-H{i)\x,a{i) = Uk] 

k 

+P(Wi > x\a{i) = afc)E[exp(-aj(Wi - x))|Wi > x,a{i) = at] 
= ( 1 -exp(-Qfcx) +exp(-Qfcx) — ^ — ) 



1 - ^ V — — exp(-afcx) 
A ^ + Qfc 



Thus, 



E[-a,C^«] = l-l^^Z^E[exp(- a,,C^\|,,^,...,,_^}(i;))] 



The other equahties follow identically. □ 
By taking differences, we obtain 
Af^'^'(i,r) -M+'^'(i,r) = 

fc \ l<Z<(i l<Z<(i / 

We now use the identity 

n n X] (( n xk){xi-yi){ n 

l<l<r l<l<r l<Kr l<fc</-l /+l<A.<r 

which further implies 



KKr KKr KKr 



when max/ |?//| < 1. By applying this inequality with xi = exp(-QfcC+^^. .^^ .^_^^(i/,r - 1)) 
and yi = exp(-afcC^^|. .^^ r - 1)), we obtain 

l<fc<m l<«<d 

This implies 

\M~'^(i,r) - M:t'\i,r)\ < ^ V — max |M~{^.. . . Mi,r - 1) - M:t{) ■ ■ ■ , (i;,r - 1)1.(49) 

l<fc<m 
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For any t > and j, define e^j- as follows 

erj= sup |M^'^'(ii,r) - M+'^'(z,r)| (50) 
ncg,ie'H 

By taking maximum on the right and left hand side successively, inequality (|49p implies 



l<k<m 



For any t > 0, denote er the vector of (er,i, . . . , er-,m)- Denote M the matrix such that for all (j, k), 
^j,k = s ^-fe- We finally obtain 

er < Afer_i. 

Therefore, if converges to zero exponentially fast in each coordinate, then also er converges 
exponentially fast to 0. Following the same steps as the proof of theorem [3l this will imply that 
for each node, the error of a decision made in X(r, 0) is exponentially small in r . Note that ^ < 1. 

Recall that aj = . Therefore, for each j, fc, we have Mj^k ^ pi+p'' • Define Ma to be a A x A 
matrix defined by Mjj = 1/2, Mj^^ = l,j > k and Mj^^ = {'^/pY~^,k > j, for all 1 < j, < A. 
Since M < Ma, it suffices to show that M^ converges to zero exponentially fast. Proof of theorem 
Hlwill thus be completed with the proof of the following lemma: 

Lemma 10. Under the condition p > 25, there exists 5 = 5{p) < 1 such that the absolute value of 
every entry of is at most 5'''{p). 

Proof. Let e = 1 /p. Since elements of M are non-negative, it suffices to exhibit a strictly positive 
vector X = x{p) and < = 9{p) < 1 such that M'x < Ox, where M' is transpose of M. Let x be 
the vector defined by Xk = e^^"^ , 1 < A; < A. We show that for any j, 

(M'x). < (1/2 + 2 ^ 



1 - V^' ' 

It is easy to verify that when p > 25, that is e < 1/25, (1/2 + 2^^^) < 1, and the proof would be 
complete. Fix 1 < j < A. Then, 

{M'x)j = MkjXk + l/2xj + Yl MkjXk 

= Y ei-'=e'=/2 + i/2eJV2+ ^ ,k/2 
i<fc<i-i i+i<fc<A 

Since xj = e-'/^, we have 

l<k<j~l j+l<fc<A 

= 1/2+ Y E 

This completes the proof of the lemma and of the theorem. □ 
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7.4 Hardness result. Proof of Theorem [5] 



The main idea of the proof is to show that the weight of a maximum weighted independent set is 
close to the cardinahty of a maximum independent set. A similar proof idea was used in |LV97j for 
proving the hardness of approximately counting independent sets in sparse graphs. 

Given a graph G with degree bounded by A, let denote (any) maximum cardinality inde- 
pendent set, and let /* denote the unique maximum weight independent set corresponding to i.i.d. 
weights with exp(l) distribution. We make use of the following result due to Trevisan [TreOlj . 

Theorem 9. There exist Aq and c* such that for all A > Aq the problem of approximating the 
largest independent set in graphs with degree at most A to within a factor p = A/2'^*^'°s^ is 
NP-complete. 

Our main technical result is the following proposition. It states that the ratio of the expected 
weight of a maximum weight independent set to the cardinality of a maximum independent set 
grows as the logarithm of the maximum degree of the graph. 

Proposition 9. Suppose A > 2. For every graph G with maximum degree A and n large enough, 
we have: 

E\w(r)] , , 

l<^^<101ogA. 

This in combination with Theorem [9] leads to the desired result. 

Proof. Let < VF(2) < • • • < W{n) be the ordered weights associated with our graph G. 

Observe that 

E[W{n] = ElY^W.] 

n 

< E[ W{i)] 

n-\I*\+l 
n 

< E\ W(i)\. 

n-|i"*f|+i 



The exponential distribution implies -©[^^(j)] = fl{n) — H{n — j), where H{k) is the harmonic 
sum Ei<i<fe 1/^- Thus 

n 

J2 E[W{j)]= Y {H{n)-H{n-j)) 

j=n-|/A-f|+l n-\I'^'\+l<j<n 

= \l''\H{n)- Y 

i<|/M|-i 
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We use the bound log(/c) < H{k) — 7 < log(A:) + 1, where 7 .57 is Euler's constant. Then 

n 

mm < \i''\{H{n)-^)+\og{\i^\)+2- Yl i°g(i) 

j=n-|7*-f|+i l<i<|-f^^| 

< \I^^\{H{n)-j)+\og{\l''\) + 2- log{t)dt 

< log(n) + |/*^| + log(|/^^|) + 2 - |/^^| log(|/^^|) + \I^^\ 

< + l)(log ^ + 2 + log(|/^^|)/|/^^|) 

< |/''^|(log(A + 1) + 3) + (log(A + 1) + 3), 

where the bound |/^^| > n/(A + 1) (obtained by using the greedy algorithm, see Section [7.2.2[) is 
used. Again using the bound > n/(A + 1), we find that ^^^[ip^ < log(A + 1) + 3 + o(l). 

Since E[W{r)] > E[W{I^^)] = it follows that for all sufficiently large n, 1 < ^^^^ < 

log(A + 1) + 4. The proposition follows since for all A > 2 we have log(A + 1) + 4 < 10 log A. □ 



8 Conclusion 

We considered an optimization model which encompasses many models from the literature including 
graphical models, combinatorial optimization and economics. In our model, cooperating agents 
within a networked structure choose decisions from a finite set of actions and seek to collectively 
optimize a global welfare objective function, which can be additively decomposed on the nodes and 
edges of the network. The main goal is to answer whether it's possible to find near optimal solutions 
efficiently, and if possible using distributed algorithms relying only on local information. Despite 
the apparent NP-hardness of such a problem even in the approximation setting, we find that in 
a framework where cost functions are random, this goal is often achievable. Specifically, we have 
constructed a general purpose algorithm Cavity Expansion, which relies on the local information 
only, and thus is distributed. We have established that under the so-called correlation decay 
property, our algorithm finds a near optimal solution with high probability. We have identified 
a variety of models which exhibit the correlation decay property and we have proposed general 
purpose techniques, such as the coupling technique, which we used to prove the correlation decay 
property. 

Our results highlight interesting and intriguing connections between the fields of complexity of 
algorithms for combinatorial optimization problems and statistical physics, specifically the cavity 
method and the issues of long-range independence. For example in the special case of the MWIS 
problem we showed that the problem admits a PTAS, provided by the CE algorithm, for certain 
node weight distribution, even though the maximum cardinality version of the same problem is 
known to be non-approximable unless P=NP. 

It would be interesting to see what weight distribution are amenable to the approach proposed 
in this paper. For example, one could consider the case of Bernoulli weights and see whether 
the correlation decay property breaks down precisely when the approximation becomes NP-hard. 
Furthermore, it would be interesting to see if the random weights assumption for general decision 
networks can be substituted with deterministic weights which have some random like properties. 
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in a fashion similar to the study of pseudo-random graphs. This would move our approach even 
closer to the worst-case combinatorial optimization setting. 

The framework studied here can be further extended in several additional ways. First, we can 
consider a network of agents who, instead of cooperating, behave selfishly. Using ideas similar to 
those presented in this paper, we believe it is possible to identify settings where using a distributed 
procedures representing communication between the agents, one can find in polynomial time Nash 
equilibrium of the underlying system. Second, one can consider a dynamical setting where agents 
take repeated actions that affect both their reward and their future state. This class of models, 
known as factored Markov Decision Processes, has a very large number of applications (supply 
chain, communication networks, and many others), but optimality bounds have been identified 
only in very restricted settings. Again, concepts such as correlation decay may be found useful to 
approach these problems and identify new settings where the solution can be found in polynomial 
time, despite the curse of dimensionality typically exhibited by these models. 
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